Skip to content

improvements to parse_dtype #3264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 22, 2025
Merged

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Jul 17, 2025

  • Add a new function parse_dtype. parse_data_type is kept around but it just wraps parse_dtype. The reason for this change is naming consistency -- the ZDType methods already use the "dtype" abbreviation extensively, so it's potentially confusing that parse_data_type does not.
  • Handle strings and sequences as potential json-like inputs. Adds tests to ensure that the JSON form a of a dtype is a valid argument to parse_dtype (with the exception of "|O", which is ambiguous).

closes #3263

…more JSON-like inputs, and test for round-trips
@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 17, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

cc @TomNicholas

@d-v-b d-v-b requested a review from a team July 17, 2025 14:23
@d-v-b d-v-b changed the title improvments to parse_dtype improvements to parse_dtype Jul 17, 2025
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 59.65%. Comparing base (fd5425b) to head (8d2c0f0).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/dtype/__init__.py 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3264      +/-   ##
==========================================
+ Coverage   59.56%   59.65%   +0.08%     
==========================================
  Files          78       78              
  Lines        8684     8690       +6     
==========================================
+ Hits         5173     5184      +11     
+ Misses       3511     3506       -5     
Files with missing lines Coverage Δ
src/zarr/core/array.py 69.02% <100.00%> (ø)
src/zarr/dtype.py 0.00% <ø> (ø)
src/zarr/core/dtype/__init__.py 30.00% <75.00%> (+5.92%) ⬆️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 17, 2025

d684ada adds a test to ensure that parse_dtype is the same as parse_data_type

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - I like the name change. Having two identical functions in our API seems a bit confusing from a user POV: https://zarr--3264.org.readthedocs.build/en/3264/api/zarr/dtype/index.html#functions. Could you remove parse_data_type from __all__ so it's removed from the docs, but will still be imported and work for backwards compatibility?

@d-v-b
Copy link
Contributor Author

d-v-b commented Jul 22, 2025

a notable recent change: I widened the accepted inputs even further, and now parse_dtype / parse_data_type will accept the output of dtype.to_json(zarr_format=2), which allows you to unambiguously request a vlen string dtype for zarr v2 data via a dict like {"name": "|O", "object_codec_id": "vlen-utf8"}

@d-v-b d-v-b requested a review from dstansby July 22, 2025 10:23
directly, or a JSON representation of a ZDType, or a native dtype, or a python object that
can be converted into a native dtype.
zarr_format : ZarrFormat
The Zarr format version. This parameter is required because this function will attempt to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stretch goal (for another issue/PR) - would it be possible to make this optional for certain inputs (e.g., converting 'int32' doesn't depend on the version of the zarr spec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not have a function where one parameter is optional depending on the value of another parameter. That should probably be a different function entirely, e.g. one that specifically takes numpy dtype-like inputs and finds the corresponding zarr data type.

d-v-b and others added 5 commits July 22, 2025 13:22
Co-authored-by: David Stansby <dstansby@gmail.com>
Co-authored-by: David Stansby <dstansby@gmail.com>
Co-authored-by: David Stansby <dstansby@gmail.com>
Co-authored-by: David Stansby <dstansby@gmail.com>
Co-authored-by: David Stansby <dstansby@gmail.com>
d-v-b and others added 2 commits July 22, 2025 13:27
@dstansby dstansby merged commit a27d4d6 into zarr-developers:main Jul 22, 2025
30 checks passed
@d-v-b d-v-b deleted the widen-parse-data-type branch July 22, 2025 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incomplete round-tripping of v3 data type json
2 participants